63 research outputs found

    Identification of conserved regulatory elements by comparative genome analysis

    Get PDF
    BACKGROUND: For genes that have been successfully delineated within the human genome sequence, most regulatory sequences remain to be elucidated. The annotation and interpretation process requires additional data resources and significant improvements in computational methods for the detection of regulatory regions. One approach of growing popularity is based on the preferential conservation of functional sequences over the course of evolution by selective pressure, termed 'phylogenetic footprinting'. Mutations are more likely to be disruptive if they appear in functional sites, resulting in a measurable difference in evolution rates between functional and non-functional genomic segments. RESULTS: We have devised a flexible suite of methods for the identification and visualization of conserved transcription-factor-binding sites. The system reports those putative transcription-factor-binding sites that are both situated in conserved regions and located as pairs of sites in equivalent positions in alignments between two orthologous sequences. An underlying collection of metazoan transcription-factor-binding profiles was assembled to facilitate the study. This approach results in a significant improvement in the detection of transcription-factor-binding sites because of an increased signal-to-noise ratio, as demonstrated with two sets of promoter sequences. The method is implemented as a graphical web application, ConSite, which is at the disposal of the scientific community at . CONCLUSIONS: Phylogenetic footprinting dramatically improves the predictive selectivity of bioinformatic approaches to the analysis of promoter sequences. ConSite delivers unparalleled performance using a novel database of high-quality binding models for metazoan transcription factors. With a dynamic interface, this bioinformatics tool provides broad access to promoter analysis with phylogenetic footprinting

    Digital transcriptome profiling of normal and glioblastoma-derived neural stem cells identifies genes associated with patient survival.

    Get PDF
    BACKGROUND: Glioblastoma multiforme, the most common type of primary brain tumor in adults, is driven by cells with neural stem (NS) cell characteristics. Using derivation methods developed for NS cells, it is possible to expand tumorigenic stem cells continuously in vitro. Although these glioblastoma-derived neural stem (GNS) cells are highly similar to normal NS cells, they harbor mutations typical of gliomas and initiate authentic tumors following orthotopic xenotransplantation. Here, we analyzed GNS and NS cell transcriptomes to identify gene expression alterations underlying the disease phenotype. METHODS: Sensitive measurements of gene expression were obtained by high-throughput sequencing of transcript tags (Tag-seq) on adherent GNS cell lines from three glioblastoma cases and two normal NS cell lines. Validation by quantitative real-time PCR was performed on 82 differentially expressed genes across a panel of 16 GNS and 6 NS cell lines. The molecular basis and prognostic relevance of expression differences were investigated by genetic characterization of GNS cells and comparison with public data for 867 glioma biopsies. RESULTS: Transcriptome analysis revealed major differences correlated with glioma histological grade, and identified misregulated genes of known significance in glioblastoma as well as novel candidates, including genes associated with other malignancies or glioma-related pathways. This analysis further detected several long non-coding RNAs with expression profiles similar to neighboring genes implicated in cancer. Quantitative PCR validation showed excellent agreement with Tag-seq data (median Pearson r = 0.91) and discerned a gene set robustly distinguishing GNS from NS cells across the 22 lines. These expression alterations include oncogene and tumor suppressor changes not detected by microarray profiling of tumor tissue samples, and facilitated the identification of a GNS expression signature strongly associated with patient survival (P = 1e-6, Cox model). CONCLUSIONS: These results support the utility of GNS cell cultures as a model system for studying the molecular processes driving glioblastoma and the use of NS cells as reference controls. The association between a GNS expression signature and survival is consistent with the hypothesis that a cancer stem cell component drives tumor growth. We anticipate that analysis of normal and malignant stem cells will be an important complement to large-scale profiling of primary tumors.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Exploring hepatic hormone actions using a compilation of gene expression profiles

    Get PDF
    BACKGROUND: Microarray analysis is attractive within the field of endocrine research because regulation of gene expression is a key mechanism whereby hormones exert their actions. Knowledge discovery and testing of hypothesis based on information-rich expression profiles promise to accelerate discovery of physiologically relevant hormonal mechanisms of action. However, most studies so-far concentrate on the analysis of actions of single hormones and few examples exist that attempt to use compilation of different hormone-regulated expression profiles to gain insight into how hormone act to regulate tissue physiology. This report illustrates how a meta-analysis of multiple transcript profiles obtained from a single tissue, the liver, can be used to evaluate relevant hypothesis and discover novel mechanisms of hormonal action. We have evaluated the differential effects of Growth Hormone (GH) and estrogen in the regulation of hepatic gender differentiated gene expression as well as the involvement of sterol regulatory element-binding proteins (SREBPs) in the hepatic actions of GH and thyroid hormone. RESULTS: Little similarity exists between liver transcript profiles regulated by 17-Ī±-ethinylestradiol and those induced by the continuos infusion of bGH. On the other hand, strong correlations were found between both profiles and the female enriched transcript profile. Therefore, estrogens have feminizing effects in male rat liver which are different from those induced by GH. The similarity between bGH and T3 were limited to a small group of genes, most of which are involved in lipogenesis. An in silico promoter analysis of genes rapidly regulated by thyroid hormone predicted the activation of SREBPs by short-term treatment in vivo. It was further demonstrated that proteolytic processing of SREBP1 in the endoplasmic reticulum might contribute to the rapid actions of T3 on these genes. CONCLUSION: This report illustrates how a meta-analysis of multiple transcript profiles can be used to link knowledge concerning endocrine physiology to hormonally induced changes in gene expression. We conclude that both GH and estrogen are important determinants of gender-related differences in hepatic gene expression. Rapid hepatic thyroid hormone effects affect genes involved in lipogenesis possibly through the induction of SREBP1 proteolytic processing

    In Silico Detection of Sequence Variations Modifying Transcriptional Regulation

    Get PDF
    Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation

    Complex Loci in Human and Mouse Genomes

    Get PDF
    Mammalian genomes harbor a larger than expected number of complex loci, in which multiple genes are coupled by shared transcribed regions in antisense orientation and/or by bidirectional core promoters. To determine the incidence, functional significance, and evolutionary context of mammalian complex loci, we identified and characterized 5,248 cisā€“antisense pairs, 1,638 bidirectional promoters, and 1,153 chains of multiple cisā€“antisense and/or bidirectionally promoted pairs from 36,606 mouse transcriptional units (TUs), along with 6,141 cisā€“antisense pairs, 2,113 bidirectional promoters, and 1,480 chains from 42,887 human TUs. In both human and mouse, 25% of TUs resided in cisā€“antisense pairs, only 17% of which were conserved between the two organisms, indicating frequent species specificity of antisense gene arrangements. A sampling approach indicated that over 40% of all TUs might actually be in cisā€“antisense pairs, and that only a minority of these arrangements are likely to be conserved between human and mouse. Bidirectional promoters were characterized by variable transcriptional start sites and an identifiable midpoint at which overall sequence composition changed strand and the direction of transcriptional initiation switched. In microarray data covering a wide range of mouse tissues, genes in cisā€“antisense and bidirectionally promoted arrangement showed a higher probability of being coordinately expressed than random pairs of genes. In a case study on homeotic loci, we observed extensive transcription of nonconserved sequences on the noncoding strand, implying that the presence rather than the sequence of these transcripts is of functional importance. Complex loci are ubiquitous, host numerous nonconserved gene structures and lineage-specific exonification events, and may have a cis-regulatory impact on the member genes

    Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs

    Get PDF
    The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species

    Mammalian MicroRNA Prediction through a Support Vector Machine Model of Sequence and Structure

    Get PDF
    BACKGROUND: MicroRNAs (miRNAs) are endogenous small noncoding RNA gene products, on average 22 nt long, found in a wide variety of organisms. They play important regulatory roles by targeting mRNAs for degradation or translational repression. There are 377 known mouse miRNAs and 475 known human miRNAs in the May 2007 release of the miRBase database, the majority of which are conserved between the two species. A number of recent reports imply that it is likely that many mammalian miRNAs remain to be discovered. The possibility that there are more of them expressed at lower levels or in more specialized expression contexts calls for the exploitation of genome sequence information to accelerate their discovery. METHODOLOGY/PRINCIPAL FINDINGS: In this article, we describe a computational method-mirCoS-that uses three support vector machine models sequentially to discover new miRNA candidates in mammalian genomes based on sequence, secondary structure, and conservation. mirCoS can efficiently detect the majority of known miRNAs and predicts an extensive set of hairpin structures based on human-mouse comparisons. In total, 3476 mouse candidates and 3441 human candidates were found. These hairpins are more similar to known miRNAs than to negative controls in several aspects not considered by the prediction algorithm. A significant fraction of predictions is supported by existing expression evidence. CONCLUSIONS/SIGNIFICANCE: Using a novel approach, mirCoS performs comparably to or better than existing miRNA prediction methods, and contributes a significant number of new candidate miRNAs for experimental verification

    Gene complexes and regulatory domains in metazoan genomes

    Get PDF
    Despite the recent massive increases in genome and transcript sequence data, including wholegenome sequences for humans and many other metazoans, our understanding of the content of these sequences is far from complete. This thesis is about making use of metazoan sequence data to detect functional genetic elements on a genome-wide scale and examine the distribution of those elements on chromosomes. Specifically, the thesis focuses on the occurrence of gene complexes, such as pairs of overlapping genes, and on chromosomal regulatory domains of importance in development and disease. Mammalian genomes contain a larger than expected number of complex loci, in which genes on opposite strands share transcribed regions, exons and/or core promoters. We find that, in both human and mouse genomes, 25% of transcriptional units (TUs) share exon sequence with a TU on the opposite strand. The true proportion is likely to be significantly higher because transcriptomes are not fully sequenced. Intriguingly, most pairs of overlapping TUs consist of one coding and one noncoding TU. We have included a large dataset of transcript sequences from such noncoding TUs in a database of noncoding RNA (http://research.imb.uq.edu.au/RNAdb). While nearly a thousand cases of overlapping TU arrangements are conserved between human and mouse, these constitute only 17% of all detected TU overlaps, suggesting that many species-specific arrangements exist. Taking advantage of newly available CAGE tag data on transcription start site locations, we analyze bidirectional promoters and show that their divergent transcription initiation regions are broad and often separated only by a small region (<60 bp) at which overall sequence composition changes strand. Vertebrate, insect and nematode genomes contain an abundance of highly conserved noncoding elements (HCNEs) that appear to function as enhancers for developmental regulatory genes around which they cluster. We show evidence that large blocks of conserved synteny (genomic regulatory blocks, GRBs) have been maintained, across vertebrates and across insects, to keep arrays of HCNEs intact. GRBs often contain bystander genes whose functions and expression patterns are unrelated to those of the presumptive target genes of HCNE enhancer activity. By analyzing the fate of duplicated genes and HCNEs after whole-genome duplication in teleosts, we show that bystander genes are indeed independent of the regulatory input of HCNE arrays. In addition, we describe differences in core promoters between target genes and bystander genes that might explain the differences in their responsiveness to long-range enhancers. We present a web resource (http://ancora.genereg.net) for exploring the distribution of HCNEs on metazoan chromosomes. Together with other recent studies, this work challenges the canonical colinear model of how genes and their regulatory elements are arranged in metazoan genomes. Vertebrate and insect genomes appear to contain an abundance of nested and overlapping gene structures, giving rise to both coding and noncoding transcripts. In addition, regulatory elements controlling the expression of a gene are frequently distributed within or beyond other genes. These findings should be taken into account in future studies of regulation of gene expression and effects of genetic variation by considering the genomic neighborhood of genes and polymorphisms of interest, up to distances on the order of a million base pairs in the human genome
    • ā€¦
    corecore